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Information  in  Censored  Models 


by 

Myles  Hollander,  Frank  Proschan,  and  James  Sconing 
Florida  State  University 

Abstract 

Criteria  are  developed  for  measuring  information  in  the  randomly  right- 
censored  model.  Measures  which  are  appropriate  include  an  extension  of  Shannon's 
entropy.  The  measures  are  seen  to  satisfy  some  fundamental  theorems  including 

(i)  the  uncensored  case  is  always  at  least  as  informative  as  any  censored  model* 

(ii)  information  decreases  as  censoring  increases  stochastically,  and  (iii)  the 
information  gain  is  marginally  decreasing. 


_Accession  For 

Wffs  GRA4I 
DTIC  TAB 

Unannounced 

Justification 


□ 


By_ 


-distribution/ 

'Avail  and/or — 
Special 


1.  Introduction. 


Let  X^,  X2*  Xn  be  independent  and  identically  distributed  positive 
random  variables  corresponding  to  the  true  lifetimes  of  some  items  on  test.  With 
every  Xi  there  is  a  corresponding  Y^,  independent  of  X^.  The  Y^s  are  also  inde¬ 
pendent  and  identically  distributed  on  the  positive  real  line.  Y^  is  said  to  be 
the  censoring  variable.  The  observations  consist  of  the  iid  pairs  (Z^,  6j), 
i  =  1 ,  ...»  n,  where  Z^  =  min(X^,  Y^),  6^  =  I(X£Y),  and  1(A)  denotes  the  indicator 
function  for  the  set  A.  This  is  the  randomly  right-censored  model. 

Typically  the  goal  is  to  make  inferences  about  some  property  of  the  distri¬ 
bution  of  X.  The  censoring  variable  can  be  thought  of  as  a  confounding  variable 
which  inhibits  the  ability  to  see  X.  Suppose  it  is  desired  to  compare  experiments 
where  different  tyjjjgs  of  censoring  may  take  place  and  then  decide  which  experiment 
is  preferred.  One  approach  is  to  use  the  "information"  in  the  experiment  as  a 
basis  for  decision. 

The  term  information  was  first  used  by  Fisher  (1925)  to  describe  the  effi¬ 
ciency  of  an  estimator  of  some  parametric  component  of  the  unknown  distribution 
function. 

A  more  common  usage  of  the  term  is  in  the  field  of  communication  theory 
pioneered  by  Shannon  (1948).  Shannon's  information  can  be  viewed  as  a  measure  of 
uncertainty  as  to  the  outcome  of  a  random  variable.  In  our  paper  Shannon's  mea¬ 
sure  is  extended  to  provide  a  comparison  of  experiments  in  the  censored  model. 

In  extending  Shannon's  measure  to  the  censored  case  and  developing  other  suitable 
measures  of  information,  we  find  that  the  notion  that  more  censoring  should  yield 
less  information  is  fundamental.  This  property  should  hold  for  any  satisfactory 
measure  of  information. 

The  property  of  decreasing  information  as  censoring  increases  has  also  been 
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studied  by  Lindley  (1956),  Brooks  (1982),  and  Barlow  and  Hsiung  (1983),  all  in 
the  Bayesian  context  where  information  is  given  in  terms  of  expected  risk.  The 
connection  between  our  approach  and  the  Bayesian  approach  is  given  by  Bernardo 
(1979) . 

Some  satisfactory  notions  of  information  are  developed  in  Sections  2,  3, 
and  4.  Each  information  measure  advanced  is  shown  to  satisfy  Theorems  1.1  and 
1.2  below. 

Theorem  1.1.  E[Information  (X)] 2 E[ Information  (Z,  6)]  for  every  X  and  Y. 

Theorem  1.2.  E[Information  (Zj,  6 j) ]  s  E( Information  (Z2»  6^]  for  every  X,  where 
(Z. ,  6.)  is  the  censored  variable  associated  with  Y . ,  i  *  1,  2,  and  Y.  s*Py_. 

IX  X  X 

The  form  of  the  information  measure  is  utilized  in  the  proof.  In  the  dis¬ 
crete  case,  information  takes  the  form  Ey[G]  where 

G(i)  a  Information  (X  *  j  |Y  *  i,  i >  j)  +  Information  (X  > i | Y  =  i) . 

The  first  term  represents  the  information  in  observing  the  X  variable  directly. 
The  second  term  is  the  ’’partial"  information  in  observing  only  that  X  is  larger 
than  the  observed  variable' Y.  Information  is  given  by  taking  the  expectation 
over  the  Y  variable.  Theorems  1.1  and  1.2  are  proved  by  first  showing  G(i) 

<  G(i  +  1),  for  every  i>0.  This  says  that  information  is  increased  if  the  exper¬ 
iment  is  observed  for  the  additional  time  from  i  to  i  +  1.  With  this  preliminary 
lemma  the  theorems  follow  directly. 

While  information  increases  as  censoring  decreases  there  are  limits  to  this 
increase.  Barlow  and  Hsiung  (]983)  state  "it  would  be  interesting  to  see  when 
this  (information)  gain  is  marginally  decreasing.”  This  leads  to  the  following 


theorem. 


Theorem  1.3.  Let  Xv  *  be  the  lifetime  variable  which  is  censored  deterministi¬ 


cally  at  time  i  (Type  I  censoring).  Then  for  sufficiently  large  i. 

Conformation  X^]  is  a  concave  increasing  function  of  i. 

In  Sections  2,  3,  and  4  various  regularity  conditions  are  imposed  to  obtain 
versions  of  Theorem  1.3  for  the  particular  information  measures  considered. 

In  Section  2  Shannon's  original  measure,  entropy,  is  defined  and  extended 
to  the  censored  case  and  the  three  fundamental  theorems  are  proved.  In  Section 
3  a  more  general  class  of  measures  is  developed  based  on  the  theory  of  majoriza- 
tion.  Once  again  the  three  basic  theorems  are  proved.  In  Section  4  Shannon's 
measure  is  shown  to  be  inadequate  in  the  continuous  case.  Several  measures  are 
developed  based  on  the  variance  of  the  lifetime  variable  X  and  the  fundamental 
theorems  are  proved. 

2.  Information  in  the  discrete  case. 

Shannon  (1948)  axiomatically  derived  an  information  measure  which  satisfies 
some  intuitive  requirements.  Suppose  a  variable  X  takes  on  only  two  values  with 
probabilities  p  and  1-p.  If  an  information  measure  is  denoted  H(p)  (or  H(X)) 
then  it  should  satisfy  the  following  requirements: 

(i)  H(p)  it  0  for  all  p,  0  sps  l, 

(ii)  H(*s)  *  1,  H(l)  *H(0)  *0, 

(iii)  H(X,  Y)  * !I(X|Y)  +  H(Y)  for  all  (X,  Y) , 

where  H(X,  Y)  is  the  information  in  the  joint  experiment  (X,  Y)  and  H(xjY)  is  the 
conditional  information  in  the  experiment  X,  given  the  outcome  of  experiment  Y. 
Imposing  (i)  -  (iii)  leads  to  the  definition  of  information  as 
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where  01og20=  0  and  £«  (p^,  p2,  ....  p^)  with  P(X  =  i)  =  p^.  The  choice  of.  the 
base  of  the  logarithm  is  unimportant  and  henceforth  will  be  defined  as  the  base 
of  the  natural  logarithm.  The  definition  can  also  be  extended  to  the  case  where 
X  has  a  countably  infinite  number  of  support  points.  This  measure  is  termed 
entropy. 

For  the  censored  model  where  X  and  Y  have  discrete  distributions,  let 
Pj  =  P(X*i),  q^«P(Y»i),  then  extend  (2.1)  to: 


Definition  2.1.  The  information  in  the  discrete  experiment  (X,  Y)  is 


(2.2) 

where  F,  ■  V  p.. 

jai  J 


h(£,  3)  a -Iq*!  l  p.iogp. +  p  logp  ], 
i  ]Si 3  3  11  11 


Our  definition  of  information  in  the  discrete  censored  case  can  be  inter¬ 
preted  as  follows.  Suppose  the  censoring  variable  takes  the  value  i.  Then  the 
information  in  our  observed  variable  Z  is  full  information,  -p^logpj,  if  a  death 
occurs  prior  to  the  censoring  time.  Otherwise  we  receive  partial  information, 
-Fi+1logPi+1<  Note  that  if  a  death  and  a  censorship  occur  at  the  same  time  we 
say  that  a  death  is  observed.  The  definition  follows  by  averaging  over  the 
censoring  times.  It  is  interesting  to  note  that  (2.2)  is  equivalent  to  Shannon's 
mutual  information,  H(X)  -H(X|Z,  5).  To  see  this  write  H(X)  -H(X|Z,  6)  as 
-IPilogPi  -  XrjIPi|jl°SPi|j  ^ere  r^  is  the  probability  that  (Z,  6)  =  £  =  (jj.  J2 ) 
vrhere  jj  *  1,  2,  ...  and  j2*0,  1.  Also  is  the  conditional  probability  that 
X  =  i  given  that  (Z,  6)  =  (jj,  The  mutual  information  can  be  rewritten  as 

l  p^log (Pij/PjT^)  where  p^  is  the  joint  probability  that 

[X  *  i,  (Z,  6)  =  (jj,  ip)  .  Note  that  P[X  *  i,  Z  ■  j ,  5  «  0)  ■  p^q^  if  i  >  j  , 


0  otherwise.  Also  P[X=i,  Z  =  j,  6  =  1]  a p^Qj  if  iaj,  0  otherwise.  Finally 
P[Z  =  j,  6  *  0]  =  q^F.+^,  P[Z*j,  fi®l]aPjQj.  With  these  probabilities  (2.2) 
follows  from. straightforward  calculations. 

Now  with  this  definition  for  information  in  the  discrete  censored  case  we 
show  that  the  three  basic  theorems  stated  in  Section  1  hold. 

Theorem  2.2.  H(p)  iH(£,  £)  for  all  probability  vectors  £  and  £. 

The  theorem  states  that  any  amount  of  censoring  reduces  information.  In 
order  to  prove  this  we  first  prove  Lemma  2.3.  This  lemma  has  appeared  in  the 
literature  in  several  different  forms.  DobruS in  (1963)  showed  that  H(£)  £H(f(£)) 
with  equality  if  and  only  if  f(p)  is  a  one-to-one  function.  Khinchin  (1957) 
showed  that  if  uA^  =  A  and  Ai  n  A..  =  0,  i*  j,  then  H(p(A.))  iH(p(A)),  where  Aj  repre 
sents  a  set  of  support  points  for  X  and  p(A^)  *P(XeAj).  We  prove  the  lemma  di¬ 
rectly. 

Lemma  2.3.  -Jp.jlogPj  2:  ^-1  p^logp^  -  P^log?^,  for  every  i. 

Proof.  Since  log  x  is  an  increasing  function,  then  log  Pi+J  2  log  p^,  jii  +  1. 
Hence,  pyogPj  s  J  ^logPj  ♦  ^  l  -  £  PjlogPj  ♦  F.^logF^j.  il 

We  now  proceed  with  the  proof  of  Theorem  2.2. 

Proof  of  Theorem  2.2. 

H(E.  a>  ■  -K(.  I  .PjlogPj 

sI<li(-XP4logP4)  *H(p)  (from  Lemma  2.3.).  f| 
j  3  3 


Next  we  compare  amounts  of  information  available  in  two  models  with  differ¬ 
ent  censoring  distributions ,  one  of  which  is  stochastically  larger  than  the 
other.  First  we  prove  the  following  lemma. 

Lemma  2.4.  Let  GA  =  ][  p^logp^  +  P\+1logPi+1.  Then  G.  fori*l,  2,  ...  . 

Gi  '  Gi+l  *  -Pi*llo2Pi+l  +  W°gFi*l  ‘  Fi+2l0gFi+2 

*  Pi+1(logP.+1  -  logpi+1)  + Pi+2(logFi+l  '  1°gFT+2)  2  °» 
since  Pi+1  2Pi+1  and  P.+1  1| 

We  are  now  ready  to  prove  an  analogue  of  Theorem  1.2. 


Theorem  2.S.  Let  Yj  s£  Y2>  Let  Y^  have  outcome  probability  vector  i  -  1,  2. 
Then  H(£,  £j)  s  H(£,  £2)  f°r  every  life  distribution  vector  £. 


Proof.  From  (2.2)  we  see  that  H(£,  q^)  *Fy  (-G) ,  i  =  1,  2,  where  G  is  the  func¬ 
tion  defined  by  G(i)  =  G^  as  in  Lemma  2.4.  From  Lemma  2.4  G.^  is  increasing  (non- 

decTeasing)  in  i.  Thus  EY  (-G)  s  Ev  (-G) .  |J 

1  2 

Thus  we  see  that  our  intuition  has  been  justified  in  the  simple  case  where 
Shannon's  entropy  is  the  measure  of  information.  Theorem's  2.2  and  2.5  should 
represent  a  sort  of  "acid  test"  for  the  applicability  of  any  measure  of  informa¬ 
tion. 

The  condition  of  stochastic  domination  of  the  censoring  variables  is  also 
necessary.  If  stochastic  domination  does  not  occur  then  there  exists  an  interval 
where  Yj  Y2  and  another  interval  where  Y2  Yj .  By  defining  X  to  have  support 
only  on  one  of  these  intervals  and  applying  Theorem  2.5,  a  contradiction  arises. 


In  a  similar  fashion  stochastically  decreasing  the  lifetime  variable  X  also 
yields  more  information.  Note  that  (2.1)  and  (2.2)  are  scale  invariant.  By 
relabeling  the  axis  after  stochastically  decreasing  X  and  then  applying  Theorem 
2.5  we  get  analogous  results. 

We  now  establish  a  parallel  for  Theorem  1.3. 

Theorem  2.6.  If  there  exists  a  k  such  that  for  all  i  >  k,  p^>p^+1»  and  Pk<e~1* 
then  Gi-Gi+1  is  decreasing  (nonincreasing)  in  i,  i>k. 


Proof.  It  is  sufficient  to  show  (G.  «  -  6.)  -  (G,  -  G. .„)  *0. 

-  v  l-l  l  v  l  x*V 

lGi-i  -  Gi>  -  (Gi  -  Gi.i>  * -PiloePi *  Pi.ilogpi.i *  2Fi.iloeFi.i  -  Filo*Fi  -  Fi.2logFi.2 

2  2iri.llogFi.l  '  FilogFi  -  Fi.2logFi+2  (sin“  Fk  *  e'1) 


2  2[FitllogFi<l  -  (W  (Fj  ♦  p1,2)lo8{ft)  (p.  .  Pi>2) )]  2  o.  || 

The  conditions  of  the  theorem  assure  that  the  index  i  is  far  enough  in  the 
right-hand  tail  for  the  marginally  decreasing  property  to  take  hold. 


Majorization  and  information. 


We  wish  to  consider  generalizations  of  Shannon's  entropy  measure.  In  par¬ 
ticular  requirement  (iii)  of  Section  2  is  not  universally  accepted,  and  it  is 
this  requirement  which  leads  to  the  specific  functional  form  for  Shannon’s 
entropy.  By  relaxing  this  assumption  we  can  generalize  Shannon's  measure.  Note 
that  the  information  in  an  event  is  governed  solely  by  the  probability  of  that 
event.  Thus  information  in  the  event  labeled  i  is  given  by  f(p^) .  What  types  of 
measures  perform  satisfactorily  as  measures  of  information?  To  answer  this 
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question  the  theory  of  the  majorization  ordering  is  utilized.  Majorization  is  a 
powerful  tool,  useful  in  proving  inequalities.  The  standard  reference  on  major- 
ization  is  Marshall  and  Olkin  (1979) .  First  we  state  some  definitions  and  pre¬ 
liminary  theorems. 


Definition  3.1.  Let  x,  £e  Rn,  n-dimensional  Euclidean  space.  We  say  £  majorizes 

x  (£$X)  if 

k  k 

(1)  J  y^j  ^  ,  k  =  1,  ...,  n  -  1 ; 

and, 

n  n 

(2)  l  y4=  l 

i  =  1  1  i  =  1  1 

where  x^j  ax^  >  ...  ^x^,  y^  *  •••  ay[n]  are  the  decreasin8  rearrange¬ 

ments  for  x  and  y_  respectively. 

An  equivalent  definition  of  majorization  is  given  by: 


Definition  3.2.  Let  x,  ^eRn.  Then  £^,x  if  and  only  if  there  exists  a  doubly 
stochastic  matrix  P,  such  that  x  =  £P. 


A  matrix  is  doubly  stochastic  if  each  row  sum  and  each  column  sum  equals 
one.  This  definition  illuminates  why  majorization  is  particularly  useful  in  the 
study  of  information  under  censoring.  The  £  vector  can  be  thought  of  as  the 
probabilities  when  a  censorship  has  occurred,  that  is,  (p^,  p2,  ...,  p^, 

^i+1'  _ )•  —  vector  is  the  vector  of  probabilities  of  the  life  distribu¬ 

tion  (p^,  p2,  ...).  Mote  that 

(3.1)  (Pj *  P2>  •••»  P^»  ^^+1*  P2»  •••) 

for  every  i.  The  doubly  stochastic  matrix  in  this  case  consists  of  the 
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conditional  probabilities  of  surviving  to  time  j  given  that  the  item  was  censored 
at  time  i,  i <  j. 

Functions  which  preserve  (reverse)  the  ordering  of  majorization  are  called 
Schur-convex  (Schur-concave) . 

Definition  3.3,  A  function  f:  Rn  +  R*  is  said  to  be  Schur- convex  (Schur-concave) 
if  £9x  implies  f(jr)  a(s)  f(x)  . 

Theorem  3.4.  (Schur,  1923,  Ostrowski,  1952) .  A  permutation  invariant  function 
is  Schur-convex  (concave)  if  and  only  if  -  Zj)($^(z)  "^(j)^^  s(s)  °»  w*iere 
t^le  partial  derivative  of  $  with  respect  to  z 
It  is  useful  to  identify  specific  types  of  functions  which  can  represent 
the  average  information  in  a  random  variable. 

n  i  , 

Theorem  3.5.  (Schur,  1923) .  Let  $(x)  »  £  f(x.)  where  f:  R  -*-R  .  Then  4>  is 

x  *  1 

Schur-convex  (concave)  if  and  only  if  f  is  a  convex  (concave)  function. 

This  provides  a  basis  for  constructing  information  measures  with  a  general 
function  f .  Let  the  information  in  the  occurrence  of  a  death  at  time  i  be  repre¬ 
sented  by  f(p^) .  Two  possibilities  for  classes  of  information  measures  can  be 
obtained  as  follows.  Define  A*{f:  f(x)  is  decreasing  and  f(x)/x  is  concave) 
and  8«{g:  g(x)  is  concave  and  g(x)/x  decreasing).  Then  there  aTe  the  following 
two  candidates  for  general  information  measures. 


(3.2) 

Hf(E)  “IPif(Pi)»  f 

or 

(3.3) 

Hg(E>  "IsCPi).  g*B. 
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Solomon  (1979)  uses  a  measure  similar  to  (3.3)  to  measure  ecological  diver¬ 
sity.  The  measure  given  by  (3.2)  will  be  adopted  here  as  it  represents  the 
average  information  in  an  experiment  and  it  facilitates  the  proof  of  Theorem  3.12. 

Definition  3.6.  Let  £  be  the  vector  of  probabilities  associated  with  a  life 
variable  X.  Then  the  "f-type"  information  measure  in  X  is  given  by  (3.2). 

Note  that  from  the  definition  of  the  class  A,  H^(jg)  is  Schur- concave. 

Definition  3.7.  Let  £  and  ^  be  the  probability  vectors  associated  with  X  and  Y. 
Then  the  amount  of  information  in  the  censored  model  is  defined  to  be 


where  f  e  A. 

Choosing  f(x)  « -logx  gives  (2.2),  however  (3.4)  cannot  be  obtained  as  a 

^|a£ 

measure  of  Shannon's  mutual  information.  Henceforth,  0f(0)  a  0  to  fix  the 
location. 


Lemma  3.8.  Ep^f(Pj)  2  l  p^f(Pj)  +  Pi+if(^i+i)  for 

Proof.  This  follows  immediately  from  (3.1)  and  the  fact  that  H^Qj)  is  Schur- 
concave.  J] 


Theorem  3.9.  H^p)  2  H£(£,  £)  for  every  £  and  £. 

Proof.  The  proof  is  analogous  to  that  of  Theorem  2.2.  || 


Lemma  3.10.  Let  G(f)  be  a  function  defined  by  G(f(i))  ■  G^i) 
Pi+1f(?i+1)  •  Then  G^f)  S  G^+1(f) ,  i  ■  1,  2 


l 

J*  i 


Pjf(Pj) 


♦ 


I  •  *  *  • 


PTpof.  Note  that  (p1#  p2,  ....  p^,  Pi+2,  0,  ...) 

<  (Pj#  P2»  •••»  P^»  Pi+i*  0*  •••)  and  that  is  Schur- concave.  || 

Theorem  3.11.  Let  Y^  and  Y2  be  censoring  variables  with  probability  vectors 
and  g2  respectively.  Let  Yj  Y2 .  Then  for  every  £,  K^(£,  g^)  s  gj). 

Proof .  With  Lemma  3.10  the  proof  is  analogous  to  that  of  Theorem  2.5.  || 

Theorem  3.12.  If  there  exists  a  k  such  that  for  all  i  >  k,  p^>p^+1,  then  Gj(f) 
is  a  concave  function  of  i,  i>k. 

Proof.  The  proof  is  similar  to  the  proof  in  Theorem  2.6.  |{ 

Thus  the  "f-type"  information  measures  are  suitable  for  the  discrete  cen¬ 
sored  model. 


4.  Information  in  the  continuous  case. 


Our  goal  is  to  extend  our  definition  to  include  life  distributions  which  are 
continuous.  The  obvious  analogue  of  Definition  2.1  would  be  to  define  H(p(x)) 

=«  -/ p(x)logp(x)  dx.  However,  Example  4.1  shews  that  such  a  definition  is  unsat¬ 
isfactory. 


Example  4.1.  Let  p(x)  * 


Xe”*x,  0<x<®,  X  >  0, 
0  otherwise. 


Then  H(p(x))  *  -/^Xe'*x[-Xx+logXj  dx»l-logX. 

From  Example  4.1  it  is  seen  that  H(p(x))  a  0  if  and  only  if  X>e.  Thus  the 


base  of  the  logarithm  is  crucial  in  determining  a  key  property  of  information. 
Furthermore  H(p(x))  does  not  have  the  scale  invariant  property  present  in  the 
discrete  case.  Finally  note  that  if  X<e,  then  H(p(x))  SO  so  that  an  observation 


will  decrease  our  knowledge.  All  these  properties  run  counter  to  the  properties 
which  measures  of  information  should  possess.  Thus  H(p(x))  as  defined  above  is 
unsatisfactory  for  defining  information. 

In  order  to  find  a  new  measure  of  information,  recall  the  properties  Shannon 
used  to  define  entropy:  (i)  H(p)  fcO,  (ii)  H(*s,  h)  a  0  and  H(l,  0)  *  1  and  (iii) 

H(X,  Y)  *H(X|Y)  +H(Y).  The  first  two  requirements  simply  fix  the  scale  and  thus 
are  not  crucial.  It  is  the  third  requirement,  the  so-called  additivity  criter¬ 
ion,  that  is  crucial  in  defining  entropy.  It  is  desirable  to  retain  this  crucial 
property  in  the  continuous  case.  Restricting  consideration  to  functions  of  the 
form  3(X,  EX),  where  3(*,  •)  is  a  metric,  leads  to  H(X)  *E(X-EX)2  =  o2  (Blyth, 
1959).  This  suggests: 

Definition  4.2.  Let  X  be  a  continuous  random  variable  on  the  positive  real  line 
with  p.d.f .  f(x)  and  finite  variance.  Then  the  information  in  X  is  defined  to 
be  H(X)  » H(f)  *  J£(x -  u)2  f(x)  dx a  where  xf(x)dx. 

Note  that  information,  in  any  sense,  measures  the  spread  of  the  distribution. 
From  this  it  seems  unreasonable  to  expect  any  measure  of  information  to  be  scale- 
invariant  in  the  continuous  case.  Thus  when  comparing  measures  of  information 
care  must  be  taken  to  use  the  same  scale  of  measurement.  Definition  4.2  gives  a 
measure  of  information  in  the  uncensored  case.  Recall  that  in  the  discrete  case 
there  is  full  information  if  death  occurs  prior  to  censorship,  and  only  partial 
information,  -P^logP^,  otherwise.  In  the  case  of  censoring  the  only  con¬ 
straint  is, that  the  remaining  probabilities  sum  to  Note  that  among  all 

discrete  probability  distributions  which  have  probability  remaining,  the  one 
that  gives  the  least  amount  of  information  is  that  which  puts  its  entire  remain¬ 
ing  mass  at  a  single  point.  This  would  yield  information  -P,  1logF.  1. 


Thus 


-Pi+1logP.+1  can  be  viewed  as  a  type  of  "worst  case"  under  the  probability  con¬ 
straint.  With  this  "worst  case"  type  of  reasoning  for  the  variance  measure, 
information  measures  can  be  developed. 

In  the  continuous  case,  minimizing  information  is  equivalent  to  minimizing 
3(X,  EX).  Given  that  the  censorship  takes  place  at  time  c,  the  constraint  is 
that  the  remaining  probability,  F(c),  must  be  placed  in  the  set  A*{x:  x>c).  It 
is  easy  to  show  that  if  c  £  EX,  then  3(X,  EX)  is  minimized  by  placing  all  the 
remaining  mass  at  EX.  If  c>EX,  then  3(X,  EX)  is  minimized  by  placing  all  the 
mass  at  c.  We  now  give  a  definition  for  information  in  the  continuous  censored 
case. 


Definition  4.3.  Let  X  be  a  lifetime  variable  with  p.d.f .  f(x)  and  finite  vari¬ 
ance.  Let  Y  be  a  censoring  variable  with  p.d.f.  g(y).  Let 
Z»{min(X,  Y),  I (X  s  Y) )  be  the  observed  variable.  Then  the  information  in  Z  is 
defined  to  be: 

H(1)(X,  Y)«H(1)(f,  g)»^g(c)[J5(x-w)2f(x)dx^(c-p)2F(c)I(c>v)]dc; 
equivalently, 

H(1) (f,  g)  -  /Jg(c) Jq(x  -  y)2f(x)dx  ♦  J^g(c) (c  -  y)2F(c)dc. 

From  this  definition  results  analogous  to  those  of  the  discrete  case  are 
obtained. 

Lemma  4.4.  Let  k^  •  Jjj(x  -  y)2f(x)dx  ♦  (c  -  y)2F(c)I(c  >  y) .  Then  for  every  c  >  0, 

a2  ^  k(1> 

°X2  c  * 


Proof.  o2-kJ1}  •  J^(x-y)2f(x)dx-  Cc-y)2F(c)ICc>y). 


Case  1.  If  c<y,  the  second  term  is  zero,  and  ■  f^(x  -  v)2f(x)dx  i  0. 

Case  2.  If  c  a  y,  then  o2  -  k*15  2  (c  -  y)2£f(x)dx  -  (c  -  y)2F(c)  «  0.  j| 


Theorem  4.5.  (X,  Y)  S H(X) . 


Proof.  H(1)(X)  »  J^(x-y)2f(x)dx*/Jg(c)[JJJ(x-  y)2f(x)dx]dc 
*  /Jg(c)[k^1)]dc-H(1)(X,  Y).  || 


Lemma  4.6.  k^  }  is  increasing  in  c. 

Proof,  dk^/dc  -  (c  -  y)  2f(c)  -  (c  -  y)2f(c)I(c  >  y)  ♦  2F(c)  (c  -  y)2I(c  >  y) 
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(c  -  y)  f(c)  if  csy, 

S  « 

2F(c)(C-y)  if  C  >  y, 

» 

and  each  expression  is  positive.  || 

Theorem  4.7.  Suppose  that  Yj  and  Y2  are  censoring  variables  with  d.f.'s  Gj  and 
G2  respectively.  Suppose  Yt  Yr  Then  H(1) (X,  Yj)  s  H(1) (X,  Y2) . 

Proof.  Define  a  function  by  k^(c)  3 k^^  as  defined  in  Lemma  4.4.  Then 
H^(X,  Yi)*Ey  (*^)*  Fw®  Lamia  4.6  the  conclusion  follows.  || 

l 

Definition  4.8.  X  is  said  to  have  an  increasing  failure  rate  (IFR)  if 


r(t)  ■  f(t) (F(t))  is  increasing  m  t. 

Theorem  4.9.  Let  censoring  be  deterministic  at  time  c  and  let  X  be  an  IFR 
variable.  If  there  exists  a  value  A  such  that  f(x)  is  decreasing  for  x>A,  then 
for  c  sufficiently  large,  H^(X,  c)  is  a  concave  increasing  function  of  c. 

Proof.  H^(X,  c)  »kp^  which  is  increasing  from  Lemma  4.6. 
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Computing  - 

ri.  2(c  -  p)f(c)  ♦  (c  -  pj  f"(c)  for  csp, 
dkUJ/dc=- 

c  2F(c)  -  2(c- p)f(c)  for  c>p. 

The  first  term  is  negative  if  f  is  decreasing;  thus  we  need  only  consider  the 

second  term.  We  have  2F(c)  -  2(c  -  p)f(c)  s  0  if  and  only  if  (c  -  p)-1 s  r(c) .  But 

.  •  \ 

(c-p)-1-^0  as  c-*-®.  Thus  if  X  is  IFR  and  c  is  sufficiently  large*  then  the 

<*  i 

inequality  holds.  || 

Theorem  4.9  shows  that  more  censoring  yields  less  information;  however*  this 
relationship  is  not  as  strong  as  one  would  like.  Consider  two  censoring  distri¬ 
butions  Gj  and  G 2*  where  G^  is  stochastically  larger  than  G2  up  to  time  p  and 
equal  thereafter.  Then  the  difference  in  information  reduces  to 
J^(x  -  p)2f(x)  (GjCx)  -G2(x))dx.  This  term  is  positive  from  Theorem  4.7  but  it 
merely  reflects  the  information  in  those  observations  where  a  death  occurred 
under  Model  1  and  a  censorship  occurred  under  Model  2.  The  difference  for  the 
censored  observations  is  zero  even  though  they  are  stochastically  larger  in  one 
case  than  in  the  other.  This  occurs  because  all  censored  observations  which 
occur  prior  to  time  p  are  shifted  to  p*  regardless  of  when  they  actually  occur. 

An  alternate  measure  is  sought  which  will  more  carefully  distinguish  among  cen¬ 
sored  observations.  This  can  be  achieved  by  a  constraint  which  was  previously 
ignored*  that  corresponding  to  the  value  of  the  mean  of  the  distribution,  p. 
Again*  the  "worst  case"  will  be  used  under  this  new  set  of  restrictions.  Given 
that  censorship  takes  place  at  time  c*  consider  a  new  variable,  X°,  with  p.d.f. 
f°(x) ,  which  equals  f(x)  foT  x<c,  and  minimizes  J"(x  -  p)2f°(x)dx,  under  the 
restrictions  that  J"f®(x)  »F(c)  and  J"xf®(x)dx»  J*xf(x)dx.  It  can  be  shown  that 

W  C  C 

f°(x)  must  put  all  its  mass  at  the  point  a(c)  ■  (F(c))”*J^xf(x)dx.  This  gives  a 
new  definition  for  information. 


Definition  4.10.  Let  X,  Y,  Z  be  defined  as  in  Definition  4.3.  Then  the  infor- 

(21 

nation  in  the  random  variable  Z  is  defined  by  Hv  ' (X,  Y) 

-  Jjg(c) [Jq(x -  ji)2f(x)dx  +  (<x(c)  -  p)2F(c)]dc. 

Lemma  4.11.  Let  k^  ■  Jp(x-  y)2f(x)dx  +  (o(c)  -  v)2F(c).  Then  for  every  c>  0, 

o2>  k<2>. 
x  c 

Proof,  a2  -  k<2)  *  J^(x  -  y)2f(x)dx  -  (a(c)  -  u)2F(c) 

■  J^(x- a(c))2f(x)dx  + 2(a(c)  -  i0j^(x  -  a(c))f(x)dx  -  J^(x  -  a(c))2f(x)dx  i  0.  || 
Theorem  4.12.  H(X)  iH(2)(X,  Y),  for  every  X,  Y. 

Proof.  From  Lemma  4.11  the  proof  is  the  same  as  that  of  Theorem  4.5.  |j 
Lemma  4.13.  k^  is  increasing  in  c. 

Proof.  Direct  calculations  show  that  dk^2Vdc  =  f(c)(c -y) 2  SO.  || 

Theorem  4.14.  Let  X,  Yj,  Y2  be  as  in  Theorem  4.7.  Then  H^(X,  Yj)  £H^(X,  Y2) 
for  every  X. 

Proof.  From  Lemma  4.13  the  proof  follows  along  the  lines  of  the  proof  of 
Theorem  4.7.  )| 

Definition  4.15.  A  random  variable  is  said  to  have  increasing  (decreasing)  mean 
residual  life  IMRL(DMRL) ,  if  g(y)  *  (F(y)) '1/^F(y  ♦  t)dt  is  increasing  (decreasing) 
in  y. 


Theorem  4.16.  Suppose  censoring  is  deterministic  at  time  c  and  X  is  a  DMRL  var¬ 
iable.  If  there  exists  a  number  A  such  that  f(x)  is  decreasing  for  all  x>A  then, 
for  sufficiently  large  c,  H^(X,  c)  is  a  concave  increasing  function  of  c. 
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Proof.  H^(X,  c)  »k^  which  is  increasing  by  Lemma  4.13.  Also  d2k£2Vdc2 
*  2f(c)(c-o(c))tl-  (F(c))“2(-cf(c)F(c)  ♦  a(c)f(c)F(c))]  +  (c  -  a(c))2f '(c) . 

The  second  term  is  negative  beyond  the  point  A.  The  first  term  is  negative  if 
f(c) (F(c))_1 s  (a(c)  -c)"1.  We  now  use  the  following  identity  of  Meilijson  (1971), 
g(c)r(c)  *l  +  g'(c),  where  r(c)  is  the  failure  rate  at  c,  and  g(c)  «o(c)  -  c  is  the 
mean  residual  life  function  at  c.  If  X  is  a  DMRL  variable  then  g'(c)  SO.  Thus 
g(c)r(c)  si  and  the  conclusion  follows.  || 

The  discrete  case  can  be  paralleled  in  one  more  fashion.  The  "worst  case" 
scenario  is  no  longer  used.  Now  the  remaining  mass  F(c)  is  simply  moved  to  the 
point  of  censoring.  Note  that  this  is  not  the  same  as  in  Definition  4.3.  There, 
mass  was  sometimes  displaced  to  the  right.  Here,  it  is  always  displaced  to  the 
left. 


Definition  4.17.  Let  X,  Y,  and  Z  be  as  in  Definition  4.3.  Then  information  in 
the  variable  Z  is  H^(X,  Y)  *  J^jg(c)  (o2F(c)  +  Jo(x"wc)2f  00d*)dc,  where  uc 
•  (F(c))"’1Jjjxf  (x)dx  and  f  (x)  f(x)  for  x<c,  =F(x)  for  x*c. 


2  * 

4.18.  Let  o£  denote  the  variance  of  the  truncated  density  f  .  Then 


2  2 
o  s  ov. 
c  X 


2  2 

Proof.  Let  Xj,  X2  be  iid  copies  of  X  with  p.d.f  f(x).  Then  2ax«E(Xj-X2) 


and  2o‘*E(Xj-X2)  where  X^  is  the  truncated  version  of  X^,  i*l,  2.  Then 
lettinp  A  *  (Xj  <  c,  X2<c),  B«(Xj<c,  X2*c),  C«{Xjic,  X2<c),  and 
D»{Xj2c,  X2  2  c)  we  have 

2o2  2 //A(Xj  -  x2)2f(x1)f(x2)dx1dx2  ♦  JJgtXj  -  x2)2f(x1)f(x2)dx1dx2 
♦  //c(xl  '  x2)2f(x1)f(x2)dx1dx2 

2 1/A(xj  -  x2)2f(x1)f(x2)dx1dx2  +  /q(xx  -  c)2f(Xj)F(c)dXj 
+  /o(c'x2)2f(x2)F(c)dx2“2oc'  H 


Theorem  4.19.  (X,  Y)  s  H(X)  . 

Proof.  From  Lemma  4.18  we  have 
H(3)(X,  Y)»J^g(c)[F(c)o2  +  F(c)c2] 

*  J^g(c)  [F(c)o2  ♦  F(c)o2l  -  c\  -  H(X) .  || 

2 

Lemma  4.20.  oc  is  increasing  in  c. 

2  2 

Proof.  Let  c,  <  c„.  It  is  enough  to  show  c_  So  .  Denote  the  two  random  var- 
-  12  "  C1  ^2 

iables  as  X  and  X  ,  then  X  can  be  obtained  from  X  by  truncating  X  at  c. 

C1  c2  C1  c2  c2  1 
The  desired  result  follows  from  Lemma  4.18.  || 

2  —  2 

Lemma  4.21.  Let  L  •  P(c)aY +  F(c)a .  Then  L  is  increasing  in  c. 

~  *T "  C  A  c  c 

Proof.  3L  /dc  ■  f(c)(a2  -  a2)  *  F(c)  (do2/dc)  .  Now,  from  Lemma  4.18  and  Lemma  4.20 

C  AC  C 

dL  /dc  2  0.  || 
c 

Theorem  4.22.  Let  X,  Yj,  and  Y2  be  defined  as  in  Theorem  4.7.  Then  H(3}(X,  Yj) 
S  H(3)(X,  Y2). 

Proof.  From  Lemma  4.21,  the  conclusion  follows  as  in  Theorem  4.7.  || 

Theorem  4.23.  Let  X  be  an  IFR  variable.  Suppose  censoring  is  deterministic  at 
time  c.  Suppose  there  exists  a  value  A  such  that  f(x)  is  decreasing  for  all 
x>A.  Then  for  c  sufficiently  large,  H^(X,  c)  is  a  concave,  increasing  func¬ 
tion  of  c. 

Proof.  H^(X,  c)  *L_,  which  is  increasing  by  Lemma  4.21.  Also, 

d2l  /dc2  S -2f (c) (do2/dc)  ♦  F(c) (d2o2/d c2) . 
c  c  c 

Thus  d2L  /dc2  £  0  if 
c 


fCcHFCc))"1  *  F(c)[3j®Cc  -  y)2f(y)dy)-1. 


The  term  on  the  right  decreases  to  zero.  Hence  since  X  is  an  IFR  variable,  the 
result  holds.  j| 

It  is  interesting  to  note  that  Rao  (1983)  also  suggests  variance  as  a 

measure  of  ecological  diversity.  He  considers  measures  of  the  form 

J/k(X,  Y)dP  dP  ,  where  k(»,  •)  is  a  kernel  measuring  the  distance  between  X  and 
x  y 

2 

Y.  Taking  k(X,  Y)  ■  (X-  Y)  gives  the  variance  measure. 

We  also  note  that  alternate  proofs  of  some  of  our  results  can  be  obtained 
by  using  Blackwell's  (1951)  method  for  comparing  two  experiments.  For  example, 
to  show  that  the  uncensored  case  is  always  at  least  as  informative  as  any  cen¬ 
sored  model,  let  P  denote  the  distribution  of  the  lifetime  variable  X,  Q  the 
distribution  of  the  independent  censoring  variable  Y.  Transform  X  to  (Z,  6)  by 
(Z  =  X,  6  =  1)  if  X s Y* ,  (Z  =  Y* ,  6  =  0)  if  X>Y*,  where  Y*  is  independent  of  X  and 
has  the  distribution  Q. 
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